End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

نویسندگان

  • Jason D. Williams
  • Geoffrey Zweig
چکیده

This paper presents a model for end-toend learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of the manual feature engineering of dialog state. In addition, the developer can provide software that expresses business rules and provides access to programmatic APIs, enabling the LSTM to take actions in the real world on behalf of the user. The LSTM can be optimized using supervised learning (SL), where a domain expert provides example dialogs which the LSTM should imitate; or using reinforcement learning (RL), where the system improves by interacting directly with end users. Experiments show that SL and RL are complementary: SL alone can derive a reasonable initial policy from a small number of training dialogs; and starting RL optimization with a policy trained with SL substantially accelerates the learning rate of RL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-toend approaches, HC...

متن کامل

Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent QNetworks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve fa...

متن کامل

End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning

In this paper, we present a neural network based task-oriented dialogue system that can be optimized end-to-end with deep reinforcement learning (RL). The system is able to track dialogue state, interface with knowledge bases, and incorporate query results into agent’s responses to successfully complete task-oriented dialogues. dialogue policy learning is conducted with a hybrid supervised and ...

متن کامل

Online Sequence-to-Sequence Active Learning for Open-Domain Dialogue Generation

We propose an online, end-to-end, neural generative conversational model for open-domain dialog. It is trained using a unique combination of offline two-phase supervised learning and online human-inthe-loop active learning. While most existing research proposes offline supervision or hand-crafted reward functions for online reinforcement, we devise a novel interactive learning mechanism based o...

متن کامل

Sample-efficient Deep Reinforcement Learning for Dialog Control

Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs require...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1606.01269  شماره 

صفحات  -

تاریخ انتشار 2016